🌱 Update boxcutter integration and add cross-CE collision regression test#2764
🌱 Update boxcutter integration and add cross-CE collision regression test#2764perdasilva wants to merge 8 commits into
Conversation
✅ Deploy Preview for olmv1 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
ddb7a76 to
fe0acea
Compare
There was a problem hiding this comment.
Pull request overview
This PR updates the operator-controller’s Boxcutter integration to treat all active sibling ClusterObjectSet revisions (both lower and higher revision numbers) as relevant “owners”, aiming to avoid false-positive collision reporting during revision handover. It also adds coverage to validate collision behavior when a conflicting ClusterExtension is upgraded.
Changes:
- Switch Boxcutter ownership inputs from “previous revisions only” to “all active sibling revisions” and add unit tests for the sibling listing logic.
- Add an e2e scenario asserting collisions persist even when a conflicting
ClusterExtensionupgrades to a different version. - Update Go module dependencies for Boxcutter (and related transitive deps).
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
test/e2e/features/update.feature |
Adds an e2e scenario for collision persistence across upgrades (but needs step/param fixes). |
internal/operator-controller/controllers/revision_engine_factory.go |
Adjusts Boxcutter object engine factory invocation to match updated dependency API. |
internal/operator-controller/controllers/clusterobjectset_controller.go |
Introduces sibling revision listing and feeds sibling owners into Boxcutter. |
internal/operator-controller/controllers/clusterobjectset_controller_internal_test.go |
Adds unit tests for listSiblingRevisions. |
go.mod |
Bumps Boxcutter + deps, but currently includes merge-conflict markers and an invalid local replace. |
go.sum |
Updates dependency sums, but currently includes multiple unresolved merge-conflict blocks. |
Comments suppressed due to low confidence (2)
go.mod:7
go.modstill contains unresolved merge-conflict markers (<<<<<<< / ======= / >>>>>>>). This makes the module file invalid and will breakgotooling in CI. Resolve the conflict by choosing a singlegodirective line and removing all conflict markers, then re-rungo mod tidyto ensurego.sumis consistent.
go 1.26.3
require (
github.com/BurntSushi/toml v1.6.0
github.com/Masterminds/semver/v3 v3.5.0
go.mod:329
- This
replacepoints to a developer-local filesystem path, which will not exist in CI and will break module resolution for anyone else. Repositorygo.modfiles should not contain machine-specific local replaces; rely on the tagged boxcutter version instead.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fe0acea to
ceb6fee
Compare
d81b767 to
507f415
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2764 +/- ##
=======================================
Coverage 70.42% 70.42%
=======================================
Files 143 143
Lines 10617 10625 +8
=======================================
+ Hits 7477 7483 +6
- Misses 2579 2580 +1
- Partials 561 562 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
pedjak
left a comment
There was a problem hiding this comment.
question (non-blocking): The Copilot review on an earlier push mentioned an e2e .feature file change for collision persistence across upgrades, but it's absent from the current diff. Is that planned as a follow-up PR? The unit tests cover listSiblingRevisions well, but an e2e test exercising the higher-sibling handover path would add confidence that the behavioral change works end-to-end.
507f415 to
1dea3d1
Compare
pedjak
left a comment
There was a problem hiding this comment.
Review focused on boxcutter library update and coding style.
1c88b5a to
c490544
Compare
c490544 to
c3a323b
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/lgtm |
…bjectSet Verify that a conflicting ClusterExtension with a higher-revision ClusterObjectSet does not take over resources owned by the original ClusterExtension. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
New changes are detected. LGTM label has been removed. |
pedjak
left a comment
There was a problem hiding this comment.
Review focused on test structure (unit + e2e) and a minor code comment placement nit.
| Then ClusterExtension is available | ||
| And ClusterExtension reports Progressing as True with Reason Succeeded | ||
| And ClusterExtension reports Installed as True | ||
|
|
There was a problem hiding this comment.
suggestion: This scenario has three When-Then cycles (install first CE → assert available, apply dup CE → assert collision, update first CE → assert still colliding). Multiple When-Then blocks in a single Gherkin scenario is generally considered an anti-pattern — each scenario should specify one behavior, not a multi-step integration script.
Having multiple transitions makes it harder to diagnose which step failed and why, and the scenario title ("does not take over resources") only describes the final assertion.
Consider splitting into two scenarios:
- Conflicting CE gets collision on install
- Conflicting CE with higher revision still gets collision after update
They could share setup via a Background or a reusable Given step.
Also, this scenario involves updating the CE spec (adding DeploymentConfig) and asserting behavior after the update. That makes it a better fit for update.feature than install.feature. The earlier Copilot review also referenced this being in update.feature — was there a reason to move it here?
There was a problem hiding this comment.
Moved the test to update.feature — agreed it fits better there.
On splitting: the "conflicting CE gets collision on install" case is already covered by the scenario right above ("Detect collision when a second ClusterExtension installs the same package after an upgrade"). The value of this scenario is the multi-step sequence: initial collision → spec change producing a higher-revision COS → still collision. Splitting would duplicate the first half of the existing test without adding coverage.
Also added assertions that the original CE remains Installed=True after both collision points, which was missing before.
| require.NoError(t, ocv1.AddToScheme(testScheme)) | ||
|
|
||
| type methodUnderTest struct { | ||
| name string |
There was a problem hiding this comment.
suggestion: The double loop (test cases × methods) with a map[string][]string for expected results works, but it obscures the arrange-act-assert structure. A reader has to mentally cross-reference the map key with the method definition at the top to understand what any single test case asserts.
An alternative structure that would be clearer:
- Test
listOtherActiveRevisionsdirectly with explicit predicates (the real unit under test). This covers the shared filtering logic once. - Add a couple of trivial tests for
listSiblingRevisionsandlistPreviousRevisionsthat just verify they delegate with the correct predicate (e.g., one passes all, the other filters by revision number).
This separates "does the shared logic work" from "do the wrappers delegate correctly" and makes each test read as a clean given-when-then.
There was a problem hiding this comment.
This is addressed — listOtherActiveRevisions and listSiblingRevisions have been removed. The current code only has listPreviousRevisions with a simple single-loop table-driven test. No more double loop or map cross-referencing.
|
|
||
| require.ElementsMatch(t, tc.expectedRevs, names) | ||
| }) | ||
| for _, m := range methods { |
There was a problem hiding this comment.
suggestion: The name format fmt.Sprintf("%s/%s", m.name, tc.name) puts the method name first, but the outer loop iterates over test cases. This means Go test output interleaves methods:
listSiblingRevisions/should exclude self...
listPreviousRevisions/should exclude self...
listSiblingRevisions/should exclude archived...
listPreviousRevisions/should exclude archived...
Swapping to tc.name/m.name (or swapping the loop order) would group output by method, making it easier to scan all cases for one method when debugging a failure.
There was a problem hiding this comment.
Also addressed — the double loop is gone entirely. Single loop over test cases calling listPreviousRevisions directly.
| rev1 := newTestClusterObjectSetInternal(t, "rev-1") | ||
| rev1.Finalizers = []string{"test-finalizer"} | ||
| rev1.DeletionTimestamp = &metav1.Time{Time: time.Now()} | ||
| rev2 := newTestClusterObjectSetInternal(t, "rev-2") |
There was a problem hiding this comment.
suggestion: The old test marked rev-2 (middle revision) as deleting with currentRev: "rev-3", which tested the case where a middle revision is deleting while both lower and higher revisions exist. The new test moves the deletion to rev-1 (lowest revision). Both validate the "skip deleting" filter, but the old arrangement was a more interesting interleaving — it verified that a deleting revision between two active ones is correctly excluded regardless of its position.
Consider keeping deletion on rev-2 and adjusting the expected results accordingly, or adding a second sub-case for middle-revision deletion.
There was a problem hiding this comment.
Addressed — deletion is back on rev-2 (middle revision) with currentRev: "rev-3", matching the original arrangement.
| return nil | ||
| } | ||
|
|
||
| // listSiblingRevisions returns all active revisions belonging to the same ClusterExtension, excluding the current one. |
There was a problem hiding this comment.
suggestion: The godoc here explains why boxcutter needs siblings (avoiding false collisions during handover). That is caller context — it belongs at the call site in buildBoxcutterPhases where WithSiblingOwners is passed. The method itself just returns "all active non-self revisions for the same owner," which is already clear from its name and signature.
Keeping method-level docs focused on what the method does (not why a specific caller needs it) prevents the comment from becoming stale if another caller is added.
There was a problem hiding this comment.
Addressed — listSiblingRevisions and WithSiblingOwners are gone. The remaining listPreviousRevisions godoc only describes what the method does, no caller context.
| And ClusterExtension reports Progressing as True with Reason Retrying and Message includes: | ||
| """ | ||
| revision object collisions | ||
| """ |
There was a problem hiding this comment.
suggestion: The scenario asserts that the dup CE shows collision and that the higher revision still shows collision after update. But it doesn't verify that the original ${NAME} CE remains Available/Installed throughout the conflict. Adding something like And ClusterExtension "${NAME}" reports Installed as True after the dup's collision assertion would strengthen the test — confirming the original owner is unaffected is arguably the most important property to verify here.
There was a problem hiding this comment.
Agreed — added ClusterExtension "ce-${SCENARIO_ID}" reports Installed as True after both collision points (initial collision and post-update higher-revision collision).
…riginal CE stays installed - Move the higher-revision collision test from install.feature to update.feature where it fits better alongside the existing collision scenario. - Add assertions that the original ClusterExtension remains Installed=True after both collision points. - Add NamedClusterExtensionReportsCondition and ClusterExtensionHasClusterObjectSets step definitions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test now lives in update.feature only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| """ | ||
| And ClusterExtension is rolled out | ||
| And ClusterExtension is available | ||
| And ClusterExtension "${NAME}" has 1 ClusterObjectSet |
There was a problem hiding this comment.
Renamed to owns — ClusterExtension "${NAME}" owns 1 ClusterObjectSet.
| """ | ||
| Then ClusterExtension reports Progressing as True with Reason Retrying and Message includes: | ||
| """ | ||
| revision object collisions |
There was a problem hiding this comment.
could we assert here the full message?
There was a problem hiding this comment.
The full message includes the phase index and colliding object details (GVK, namespace/name with scenario IDs), e.g.:
revision object collisions in phase 0
ConfigMap.v1 ns-<id>/test-configmap-<id> ...
Asserting the full message would be brittle since the object names are scenario-specific. The existing collision test at line 274 uses the same Message includes approach with just the fragment. Happy to add a more specific fragment like "revision object collisions in phase" if that helps.
| matchLabels: | ||
| "olm.operatorframework.io/metadata.name": ${CATALOG:test} | ||
| """ | ||
| Then ClusterExtension reports Progressing as True with Reason Retrying and Message includes: |
There was a problem hiding this comment.
given that we have applied in this tests two ClusterExtension it would be helpful here to distinguish visually which one reports the condition:
ClusterExtension "${NAME}-dup" reports Progressing as True with Reason Retrying and Message includes:
There was a problem hiding this comment.
Added a comment above the assertion clarifying which CE is being checked: # The conflicting ClusterExtension (${NAME}-dup, now tracked as ${NAME}) should be retrying. The unnamed ClusterExtension reports steps use sc.clusterExtensionName which is the dup at that point, but that's not obvious to a reader.
A fully named variant like ClusterExtension "${NAME}-dup" reports ... with Reason ... and Message includes would require a new step definition that accepts name+type+status+reason+message — seemed like over-engineering for this one test.
| revision object collisions | ||
| """ | ||
| # Verify the original ClusterExtension remains installed and unaffected | ||
| And ClusterExtension "ce-${SCENARIO_ID}" reports Installed as True |
There was a problem hiding this comment.
should be
ClusterExtension "${NAME}" reports Installed as True
because it is very internal detail that NAME has ce-${SCENARIO_ID} convention.
There was a problem hiding this comment.
Can't use ${NAME} here — after the dup CE is applied, ResourceIsApplied overwrites sc.clusterExtensionName to the dup's name, so ${NAME} resolves to the dup (which is retrying, not installed). We need ce-${SCENARIO_ID} to reference the original CE.
${SCENARIO_ID} is a first-class template variable exposed by the framework, and the ce- prefix is established in CreateScenarioContext (hooks.go:218). The existing TrackCurrentClusterExtensionForCleanup pattern has the same constraint — once you apply a second CE, you lose the ${NAME} reference to the first one.
An alternative would be adding a new template variable (e.g. ${ORIGINAL_NAME}) that TrackCurrentClusterExtensionForCleanup populates, but that feels like over-engineering for one test.
There was a problem hiding this comment.
Can't use
${NAME}here — after the dup CE is applied,ResourceIsAppliedoverwritessc.clusterExtensionNameto the dup's name, so${NAME}resolves to the dup (which is retrying, not installed). We needce-${SCENARIO_ID}to reference the original CE.
We should update ResourceIsApplied or some other e2e test logic so that we can have clearer step descriptions.
| """ | ||
| Then ClusterExtension "${NAME}" has 2 ClusterObjectSets | ||
| # The higher-revision COS (revision 2) should also collide, not take over resources | ||
| And ClusterExtension reports Progressing as True with Reason Retrying and Message includes: |
There was a problem hiding this comment.
which cluster extension report that?
There was a problem hiding this comment.
The conflicting ClusterExtension (the dup, tracked as ${NAME} after ResourceIsApplied). The comment on the line above clarifies this: "The higher-revision COS (revision 2) should also collide, not take over resources." Same pattern as the first collision assertion where I added a clarifying comment in the latest push.
There was a problem hiding this comment.
The conflicting ClusterExtension (the dup, tracked as
${NAME}afterResourceIsApplied). The comment on the line above clarifies this: "The higher-revision COS (revision 2) should also collide, not take over resources." Same pattern as the first collision assertion where I added a clarifying comment in the latest push.
For reader it should be better if we could have ClusterExtension ${NAME}-dup ...
|
|
||
| @BoxcutterRuntime | ||
| @DeploymentConfig | ||
| Scenario: A conflicting ClusterExtension with a higher revision does not take over resources from the original owner |
There was a problem hiding this comment.
I find the scenario title to describe a low-level implementation details - we should focus ourselves here on user visible change, something like "Cannot install further ClusterExtension referring already installed bundle".
Given that we are adding this test in the PR which is about upgrading boxcutter version, I am curious to understand if before this update we did not have such protection? If we did, then I would add this tests in a separate PR. Does boxcutter upgrade gives us this functionality automatically or we need to so something on our side to achieve it?
There was a problem hiding this comment.
Renamed to "Cannot install a ClusterExtension that refers to an already installed bundle".
On the scope question: cross-CE collision protection is existing boxcutter behavior — it was there before this PR. This PR changes the sibling owners API from WithSiblingOwners (all active revisions from the same CE) to WithPreviousOwners (only lower-revision ones). That's a same-CE handover concern, not cross-CE. The e2e test was added as a regression test to confirm that cross-CE collision protection still works correctly after the API change, but it's not testing new functionality. Happy to split it out into a separate PR if you prefer.
There was a problem hiding this comment.
Happy to split it out into a separate PR if you prefer.
It would make sense to me... and to merge that first, before upgrading the boxcutter.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e9e567f to
a2d7a10
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Description
Adapt the ClusterObjectSet controller to boxcutter v0.14.0's API changes (
WithPreviousOwnersreplacesWithSiblingOwners) and add an e2e regression test for cross-ClusterExtension collision protection.Changes
listPreviousRevisions— returns active revisions belonging to the same ClusterExtension with lower revision numbers. Filters out self, archived, deleting, and equal-or-higher revisions.buildBoxcutterPhases— callslistPreviousRevisions+WithPreviousOwners.NewObjectEngine— passes the new requiredmanagedByparameter, set toFieldOwnerPrefix("olm.operatorframework.io").listPreviousRevisions.Reviewer Checklist
🤖 Generated with Claude Code